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ABSTRACT 

The predictive and incremental validity of the New 
Medical College Admission Test (New MCAT) Science Problems Subtest 
was examined with a sample of over 165 medical students. Criterion 
measures were National Board of Medical Examiners (NBME) Part I 
(basic science) and Part II (clinical science) performance. The 
Science Problems subscore is derived from a subset of the same items 
found on the Biology, Chemistry, and Physics subtests, creating 
nonindependence. Results of incremental validity analyses and of all 
possible subsets regression analyses using Mallow's Cp criterion 
raise questions concerning the practical utility of the Science 
Problems subtest in prediction equations to make admission decisions. 
Cross-validation analyses supported the inclusion of the Biology 
subtest in prediction models of both NBME Parts I and II, and of the 
Chemistry subtest for NBME Part I. (Author) 
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Predictive and Incremental Validity of the New MCAT Science Problems Subtest 



Abstract 

The predictive and incremental validity of the New Medical College Admission Test (New 
MCAT) Science Problems Subtest was examined with a sample of over 165 medical 
students. Criterion measures were National Board of Medical Examiners (NBME) Part I 
(basic science) and Part II (clinical science) performance. The Science Problems subscore 
is derived from a subset of the same items found on the Biology, Chemistry, and Physics 
subtests, creating nonindependence. Results of incremental validity analyses and of all 
possible subsets regression analyses using Mallows Cp crUerion raise questions concerning 
the practical utility of the Science Problems subtest in prediction equations to make 
admission decisions. Cross-validation analyses supported the inclusion of the Biology 
subtest in prediction models for both NBME Parts I and II, and of the Chemistry subtest 
for NBME Part I. 
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Predictive and Incremental Validity of the New MCAT Science Problems Subtest 

The Association of American Medical Colleges revised standardized test designed to 
evaluate the academic preparation of applicants to medical school was first used in 1978. 
This version, the New Medical College Admissions Test (New MCAT) differs in several 
respects from the Old MCAT. "Specifically, the Skills Analyses and Science Problems 
subtests of the New MCAT assess such abilities as information gathering and analysis, 
discerning and formulating relationships and other problem solving skill dimensions in 
their respective areas. These cognitive areas were not directly measured by the Old 
MCAT" (New MCAT Interpretive Manual, 1977). Dawson-Saunders and Doolen (1981) and 
3ones and Thomae-Forgues (1981) discussed the New MCAT's potential value as a 
predictor of clinical performance. Due to the increased emphasis on interpretation and 
problem solving in the new format, they suggest that the new MCAT may result in 
measures which are more closely associated with the information gathering, evaluation, 
and utilization skills required during the clinical experience. A number of studies have 
compared the ability of the Old and New MCAT to predict student achievement in 
medical school. Erdmann (1980) characterized the results of the "first round" of New 
MCAT studies as encouraging. 

Because the scores on the Science Problems subtest are derived from a subset of the 
items that comprise three other New MCAT subtests. Biology, Chemistry, and Physics, 
this subtest is by definition linearly dependent upon these other subtests. Thus, while 
"scores on the six New MCAT areas of assessment are designed to be relatively 
independent and are purposefully reported seperately. . . . items from the Science 
Problems subtest contribute twice to New MCAT scores" (New MCAT Interpretive 
Manual, 1977). This issue has been addressed in several New MCAT validity studies (Hull, 
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Calhoun & Maxim, 1981; Jones ic Thomae-Forgues, 1981) by excluding the Science 
Problems subtest from multivariate analyses, while it has been included in other studies 
(Friedman & Bakewell, 1980; Friedman & Porter, 1981; McGuire, 1980; Molidor & Elstein, 
1979). Psychometrically the problem is that the Science Problems subtest partakes of the 
same error component of the other subtests, violating the assumption of uncorrected 
error variance, raising serious interpretative questions in multivariate analyses such as 
factor analysis (Gorsuch, 197*). When independent variables such as these are highly 
correlated in multiple regression analyses, "not only do the estimated regression 
coefficients tend to be quite imprecise, but the true regression coefficients tend to lose 
their meaning" (Neter & Wasserman, 197*). On the other hand, multicoUinear variables 
have been included in the same analyses when strong rationale for their inclusion has been 
given. In a recent re-examination of the relevance of MCAT science content, neither the 
Science Problems subtest nor this issue of non -independence was discussed (Wilson, 1982). 
It is likely that the Science Problems subtest has been included in prediction equations 
used to make admission decisions at many medical schools. The purpose of the present 
study was to examine the usefulness of the New MCAT Science Problems subtest in 
predicting medical student basic and clinical science performance. 

Methodology 

Instrumentation and Sampling 

Scores for the entering class of 1978 at a large midwestern University medical 
school were obtained for student performance on the six New MCAT subtests (Biology, 
Chemistry, Physics, Science Problems, Skills Analysis: Quantitative, Skills Analysis: 
Reading), and the examinations of the National Board of Medical Examiners (NBME), 
Parts I and U. NBME scores (NBME, 1982) represent the criterion medical school 
performance measures examined in the study. Part I assesses basic science achievement, 
while NBME Part 11 assesses clinical science achievement. 
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Subjects were medical students in the 1982 graduating class at The University of 
Michigan Medical SchooL Because of missing data, total sample sizes were 186 subjects 
for the NBME Part I analyses and 167 subjects for the NBME Part 11 analyses. Subjects 
were randomly divided into two sub-samples, a screening sample and a calibration sample, 
in order to cross-validate the results obtained in the multiple correlation/regression 
analyses (Kerlinger & Pedhazur, 1973; Lord & Novick, 1968) described in the following 
section. AH data were analyzed for each sub-sample independently and again for the 
total combined sample. 

Capitalization on chance in the development of a regression/prediction model based 

on sample correlations is a well known problem (Lord & Novick, 1968). Because these 

sample correlations are based not only on true correlation among the variables, but also 

contain sampling error, the multiple correlation typically "shrinks" when these variables 

are used on a new sample. Both Lord and Novick (1968) and Kerlinger and Pedhazur (1973) 

recommend cross-validation procedures to address this problem. Cross-validation 

necessitates obtaining two samples. The first sample is referred to as the screening 

sample , and is used to develop the regression equation and multiple R^. The predictor 

variables of the second sample, referred to as the calibration sample , are then applied to 

the regression equation obtained from the screening sample to obtain predicted scores for 

the criterion variable. The observed criterion scores (y) for the calibration sample are 

then correlated with the predicted criterion scores (yO. This Pearson r^^^ is analogous to 

a multiple correlation between the observed and predicted scores. In the present study, 

this procedure was applied twice in order to allow each sub-sample to constitute the 

screening (and calibration) sample. This "double cross-validation procedure is strongly 

recommended as the most rigorous approach to the validation of results from regression 

analysis in a predictive framework" (Kerlinger and Pedhazur, 1973, p.28*). Results of the 

2 

two regression equations, multiple R s and ryy,s obtained from alternate samples were 
then compared. Analyses of the data were performed retrospectively and were not used 
in making admission decisions. 
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Correlational and Incremental Validity Analyses 

Pearson zero order correlations were computed to test the research hypotheses of a 
significant positive relationship between each of the MCAT subscores and the two 
criterion performance measures. Incremental validity (Lord & Novick, 1968; Sechrest, 
1963) was examined by using a step-wise, hierarchical multiple regression analysis design 
involving a two step procedure. In the first phase, all MCAT subtest scores except 
Science Problems were included in the analysis. The Science Problems subscores were 
then included in the second phase of the analysis by stepping them into the equation after 
the non-Science Problems Subtest had been stepped in. Two seperate analyses were 
performed, one for each of the criterion measures. These analyses permitted an 
examination of the usefulness of the Science Problems subtest in explaining additional 
variance in the criterion measures beyond that already explained by the other MCAT 
subtests. 

Three separate indexes of MCAT incremental validity were calculated. The first 
index indicates the absolute amount of variance (as measured by multiple R^) explained 
for each of the two criterion measures by the Science Problems subtest scores when they 
are stepped into the multiple regression analysis after all the non-Science Problems 
MCAT subtests have been included (Sechrest, 1963). This index was determined using 
formula 1. 

2 2 
Index 1= (R for all variables) - (R^ for non-Science Problems MCAT variables) (1) 

2 

= R added by Science Problems MCAT Subtest 

The second index (Friedman & Porter, 1981) provides a measure of the proportional 
increase in performance variance explained by stepping in MCAT Science Problems 
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subtest scores last in the regression analysis and was calculated using formula 2 below. 

Index 2= added by Science Problems MCAT (2) 

for non-Science Problems MCAT variables 

The third index provides a measure of the proportional increase in performance 
variance that is unaccounted for by the non-Science Problems MCAT subtests and that is 
explained by adding the Science Problems MCAT scores to the regression analysis 
(Friedman & Porter, 1981). This index was calculated using formula 3 below. 

Index 3= R^ added by Science Problems MCAT (3) 

2 

1 - (R for non-Science Problems MCAT variables) ^ 

Both indexes 2 and 3 are calculated in order to minimize artifactual differences in 

the incremental validity results (Freidman & Porter, 1981) of the Science Problems for 

2 

the two samples due to differing multiple R s or differing amounts of unexplained 
variance available (i.e., not explained by the non-Science Problems subtests). 

All Possible Subsets Regression Analyses 

All possible subsets regression analyses (Frane, 1981) including all six New MCAT 
subtests are reported for each of the criterion measures. "The only way to be sure of 
obtaining the best n of N predictors would be to determine the multiple correlation for 
every such set" by using an exhaustive procedure (Lord & Novick, 1968, p. 288). Until 
recently the economic cost of performing such analyses was prohibitive. However, "one 
major advance of the past decade in multiple regression has been the replacement of 
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stepwise procedures with all possible subset searches for model selection, served by the 
Cp plot "(Wainer & Thissen, 1981, p. 213). Use of the Furnival-Wilson (197*) algorithm 
enables the identification of "subsets while computing only a small fraction of all possible 
regressions* Computer costs are comparable for stepwise regression for up to about 25 
independent variables" (Frane, 1981, p. 26*), 

Mallow's Cp was the criterion used to identify the best subsets* The "best" subset is 
selected on the basis of an analysis of residuals that minimizes Cp based on the following 
formula (Daniel & Wood, 1971; Frane, 1981): # 



C =RSS . (N.2pO ^ (*) 

P 2 

s 



where 

RSS= residual sum of squares for the subset of independent variables being tested 
s*'= residual mean square based on the regression using all independent variables 
p = tne number of variables in the subset, including the intercept, if any. 
n= number of cases (sample size) 



2 2 

In addition, multiple R s and adjusted R s based on formula 5 were calculated. 

Adjusted R^= R^ - p (KR^) (5) 
N -p' 

where p= the number of independent variables when the intercept is set to zero. 



These analyses enabled an examination of which subtests, the Science Problems subtest 
and/or other subtests, were included in the "best" regression model for each criterion. 
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Results and Discussion 

Intercorrelations among all six New MCAT subtests and NBME Part I scores are 
summarized in Table 1 for both subsamples. Similar correlations are presented in Table 2 
for NBME Part n analyses. Table 3 contains intercorrelations for all subjects (i.e, both 
subsamples combined). The colinearity was greatest for the relationships of the Science 
Problems Subtest with the. Biology, Physics and Chemistry subtests ( r's ranged between 
.56 and .72). Not surprisingly, the content non-independence resulting from the use of 
selected items from the Biology, Physics, and Chemistry subtests in the construction of 
the Science Problems subtest is confirmed by the magnitude of these correlations. 
Because the largest amount of shared variance is 52%, it could be argued that there is 
sufficient non-overlap to justify the inclusion of the Science Problems subtest on 
theoretical gj^ounds. Correlation coefficients involving Science Problems with the NBME 
measures were exceeded in magnitude only by the Biology subtest correlation 
coefficients, except in subsample 2 where the Science Problems - NBME Part I 
correlation actually exceeded the Biology correlation (.38 versus .35). These Pearson 
correlations between Science Problems and NBME measures ranged between .38 and .55. 
Correlation coefficients' in general were higher in sample 1 than in sample 2. This could 
possibly reflect greater variability among NBME Part I and II scores among subjects in 
sample 1 than in sample 2. 
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Table 1 



Pearson Correlations Among New MCAT Subtests and 
NBME Part I Scores 



Sample 1 (n=92) 





BI 


PH 


CH 


SP 


RE 


QA 


NBM 


MCAT-BI 




AS 




.66 


.26 


A2 


.62 


MCAT-PH 


.31 




.62 


.65 


.23 


.36 


A7 


MCAT-CH 


.30 


.52 




.72 


.31 


M 


A9 


MCAT-SP 


.57 


.61 


.59 




.25 


AH 


.55 


MCAT-RE 


.12 


.27 


.28 


.33 




.26 


.30 


MCAT-QA 


.25 


.37 


.38 




.25 




.33 


NBME I 


.35 


.29 


.35 


.38 


.17 


.2'f 





Sample 2 
(n=9*) 



Note: MCAT= New Medical College Admission Test; BI= Biology; PH^ Physics; 
CH= Chemistry; SP= Science Problems; RE= Reading; QA= Quantitative. 

All correlation? greater than .205 or .267 are statistically significant at alpha=.05 
or alpha=.01, respectively (df=90). 



Table 2 



. Pearson Correlations Among New MCAT Subtests and 
NBME Part n Scores 



Sample 1 (n=:81) 





BI 


PH 


CH 


SP* 


RE 


QA 


NBM 


MCAT-BI 




.50 


.52 


.68 


.28 


M 


.56 


MCAT-PH 


.29 




.61 


M 


.17 ■ 


.37 


.37 


MCAT-CH 


.28 


.50 




.73 


.30 


A3 


.36 


MCAT-SP ' 


.58 


.59 


.,56 




.22 


A6 


A3 


MCAT-RE 


.16 


.27 


.28 


.31^ 




.2U 


AO 


MCAT-QA 


.27 


AO 


.38 


A9 


.26 




A3 


NBME n 


.37 


.21 


,^3^ 


.39 


.21 


.23 





Sample 2 
(n=86) 



t 

Note: MCAT= New Medical College Admission Test; BI= Biology; PH= Physics; 
CH= Chemistry; SP^ Science Problems; RE= Reading; QA= Quantitative. 

All correlations greater than .217 or .283 are statistically significant at 
alpha=.05 or alpha=:.01, respectively (df=80). 
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Table 3 



Pearson Correlations Among New MCAT Subtests and 
NBME Part I and n Scores for All Subjects 



NBME I(n=186) 





BI 


PH 


CH 


SP 


RE 


__QA_ 


NBME I 


MCAT-BI 




M 


M 


.63 


.20 


.31^ ' 


.51 


MCAT-PH 


.39 




.57 


.62 


.25 


.37 


.38 


MCAT-CH 




.56 




.66 


.30 


.39 


A3 


MCAT-SP 


.6k 


.61 


.65 




.28 


M 


AZ 


MCAT-RE 


.22 


.22 


.29 


.27 




.26 


.2k 


MCAT-QA 


.36 


.38 


Al 


A7 


.25 




.29 


NBME n 




.29 


.36 


A2 


.32 


.35 





(n=167) 



Note: MCAT= New Medical College Admission Test; BI= Biology;.PH= Physics; 
CH= Chemistry; SP= Science Problems; RE= Reading; QA= Quantitative. 

All correlations are statistically significant ( £< .01). 
Incremental Validity Results 

2 

Multiple R s indicated that all six New MCAT subtests accounted for 45%, 20%, and 
33% of the variance in NBME Part I scores for subsample 1, subsampfe 2, and the 
combined sample, respectively. In all but one of the incremental validity analyses 
reported in Table Science Problems did not explain any additional or incremental 
variance in NBME measures beyond that explained by the other five New MCAT subtests. 
The one exception occuced in Sample 2 for NBME Part II, where the multiple R improved 
from .21 to .22 with the addition of Science Problems. Using formula 1, Science Problems 
explained only 1% additional variance in this instance, for a 5% proportional increase in 
performance variance explained (formula 2) and 1% of the variance unaccounted for 
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by the other New MCAT subtests (formula 3). In general, these incremental validity 
analyses raise doubts concerning the practical utility of the Science Problems subtest in 
explaining variability among NBME performance not accounted for by the other subtests. 



Table 

Incremental Validity for New MCAT Science Problems Subtest 



Criterion 
Measure 



NBME I 



NBME n 



Statistic 



Sample 1 



S^ple Size (n) 92 
R, non- Sci. Prob. MCAT 

added by Sci. Prob. MCAT (1) .00 

Total R A5 

Incremental Validity (2) .00 

Incremental Validity (3) .00 

S^ple size (n) 81 

R.. non-Sci. Prob. MCAT Al 

R^ added by Sci. Prob. MCAT (1) .00 

Total R M 

Incremental Validity (2) , .00 

Incremental Validity (3) .00 



Sample 2 



91^ 



.20 
.00 
.20 
.00 
.00 



86 



.21 
.01 
.22 
.05 
.01 



All 
Subjects 



1S6 



.33 
.00 
.33 
.00 
.00 



167 



.31 
.00 
.31 
.00 
.00 



All Possible Subsets Regression Results 



These analyses were performed to examine whether the Science Problems subtest 
was a component of the best regression models for predicting NBME performance. Based 
on the selection criterion of minimizing the Cp statistic for residuals, the following 
standardized regression models were obtained for sample 1 (equation 6) and sample 2 
(equation 7): 

NBMEI.l = A9 Biology + .26 Chemistry + 1.18 (6) 



NBMEI.2 = .26 Biology + .28 Chemistry + 3.13 
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Even though there are differences in the beta weights between the two models, 
there is striking similarity between them. These results support the cross-validity and 
plausibility of a prediction model for NBME Part I scores that include only the Biology and 
Chemistry subtests. The two subsamples were then combined to provide a more stable 
regression equation (Kerlinger & Pedhazur, 1973; Mosier, 1951) and is presented below. 

NBMEI = .39 Biology + .25 Chemistry + .09 Reading + IM (8) 

This model was selected based on having the lowest Cp value (3.3 i). However, the 
model comprised of just Chemistry and Biology resulted in a Cp value of 3.36. Combined 
with Frane's (1981) recommendation that only independent variables whose coefficients 
are significantly different from zero be retained, it is unlikely that adding the Reading 
subtest would result in predictions substantially different from excluding it from the 
model (the beta coefficient of .09 was not statistically significant, £ < .16). 

The regression models obtained for NBME Part II performance for sample 1 
(equation 9) and sample 2 (equation 10) contained both similarities and differences. 

NBME II. 1 = .^fl Biology ^ .2^ Reading + .20 Quantitative -35 (9) 

NBMEII.2 = .30 Biology + .26 Chemistry + 2.*0 (10) 

It seems clear that Biology is a good predictor and should be included in the model. 
Results for the Reading, Quantitative, and Chemistry subtests are ambiguous, as their 
contributions were not cross-validated. Combining both subsamples resulted in the 
following regression model: 

NBME H = .35 Biology+.18 Reading+.13 Quantitative+.l 1 Chemistry+.*5 (11) 
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Not surprisingly, the best model for all subjects included all four subtests included in 
equations 9 and 10. Neither the beta weights for Quantitative (£ < .08) nor Chemistry 
( 2 <J6) were statistically significant. Thus while the best model for predicting NBME 
Part II scores is not clear based on these analyses, it is clear that Science Problems is not 
one of the plausible predictors under consideration. 

Table 5 summarizes the Cp, multiple R^, adjusted R^, and r^^, values for the best 

subset regression models reported above. The r^^, coefficient of .66 was obtained by 

correlating sample 1 (calibration sample) subjects observed scores with their predicted 

scores based on the model derived with sample 2 (screening sample). In general, squaring 

^he ryy, coefficients from each sample and comparing them with the multiple R or 
2 

adjusted R s coefficients from the same sample indicates striking similarity and 
consistency, particularly for NBME Part I. The difference between multiple R^ for the 
two samples, as well as the difference between r^^, coefficients, provides an estimate of 
the amount of shrinkage of the multiple correlation. In general, shrinkage decreases as 
sample sizes increase (Kerlinger <5c Pedhazur, 1973) Even though the ratio of subjects to 
the number of independent variables ranged between 13.5:1 and 15.7:1 for the two 
subsamples, these samples may still be considered relatively small for the types of 
analyses performed. As data become available for the graduating class of 1983, it would 
be useful to replicate these analyses with the entire classes of 1982 and 1983 representing 
the two samples in contrast to dividing the class of 1982 into two subsamples as reported 
here. 
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Table 5 



Mallow's C , Multiple R'', Adjusted Multiple R"^, and 
Cross- viilidated Composite Correlations (r ,) 
for Best Subset Regression Analyses 



Criterion 
Measure 



n 





NBME I 



Sample 1 
Sample 2 
All Subjects 



92 
186 



1.90 
3.31 



A3 
.19 
.33 



A2 
A7 
.31 



.66 

A3 



NBME 11 



Sample 1 
Sample 2 
All Subjects 



SI 
86 
167 



1.79 
0.93 
3.06 



A\ 
.20 
.31 



.39 
.18 
.29 



.58 
.39 



Note: r , is the Pearson r "between the observed criterion scores (y) in the calibration 
sample arid the predicted criterion scores (y'). This r , is analogous to a multiple 
correlation in which the equation used is the one obtain^^ in the screening sample" (Ker- 
linger & Pedhazur, 1973, p. 28'f). 

All Multiple R, Adjusted Multiple R, and r , correlations are statistically 
significant (£< .001). " 
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Conclusions 

Results of cross-validation analyses support the inclusion of Biology and Chemistry 
subtests in prediction models for NBME Part I performance, and of Biology for Part II 
performance. The contributions and utility of the Reading, Quantitative, and Chemistry 
subtests for predicting Part n performance are ambiguous based on the results of this 
study. 

Both the results of incremental validity and the all possible subset regression 
analyses obtained in this study raise doubts concerning the usefulness of the New MCAT 
Science Problems subtest in predicting student performance on two widely used 
standardized measures of medical school basic and clinical science achievement. 
Combined with the psychometric issues raised in using nonindependent variables in 
multivariate analyses, these results suggest great care should be exercized in using the 
Science Problems subtest in making admission decisions. Certainly one study does not 
definitively resolve this issue. Replication of these findings with samples obtained from 
other medical schools using similar and different criterion medical school performance 
measures is recommended before more definitive statements are made, although the 
caveat from this study is clear. 
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